Masthead

Lab: Exploring Data

This lab will have us exploring data to determine the nature of the data before we begin modeling.

Resources

Chapters 3 and 4 of the book cover a portion of this material and have additional information on auto correlation. The following sections of the R For Spatial Statistics web site may also be of help:

Activities

Select a data set that is a point set within the United States (it's really challenging to find covariate layers outside the US) that includes a measured value for a response variable (presence/absence, height, DBH, weight, etc.). The data set will typically be point locations for a species but also needs to include a measurement.

If you have a data set that is not a set of points with continuous values, you can use one of the methods on the web page Converting Layers to change your data into a set of points. If you do not have a data set, you may use one of the one's below:

Collect a set of covariate layers for your data set. Set up this data set as a shapefile or CSV file and extract the values from the covariate layers into your point data set using ArcMap (or use BlueSpray). Environmental variables such as precipitation and temperature are available from BioClim and PRISM on the Web.

1. Map your data set and covariates and perform a visual inspection of the data at various resolutions. Change the min/max range of values that are displayed in your GIS for rasters to find the precision of your values and hillshade them to find artifacts. Use symbols to map your attribute values for points.

Question 1: Did you find anything interesting in your data at this point? How will these issues effect your modeling?

2. Create histogram(s) of the response variable(s).

Question 2: Do the histograms appear as you expect? Are there any artifacts that might impact your models? Did you find any other issues?

3. Create histograms (categorical response) and/or scatter-grams (continuous response) of your covariate values and your measured field values.

Question 3: Do the covariates appear to "co-vary" with anything in the response variable? What type of "curve" might these be modeled with?

4. Use the R function cor(vector1,vector2), or another methods, to determine if there is a correlation between each of the predictor variables. Also create a correlation plot between all of the predictors.

Question 4: Are there any strong correlations between the covariates?

 

© Copyright 2018 HSU - All rights reserved.